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With the help of our recently developed massively parallel DGDFT (Discontinuous Galerkin Density Functional Theory) method¬ 
ology, we perform large-scale Kohn-Sham density functional theory calculations on phosphorene nanoribbons with armchair 
edges (ACPNRs) containing a few thousands to ten thousand atoms. The use of DGDFT allows us to systematically achieve 
conventional plane wave basis set type of accuracy, but with a much smaller number (about 15) of adaptive local basis (ALB) 
functions per atom for this system. The relatively small number degrees of freedom required to represent the Kohn-Sham 
Hamiltonian, together with the use of the pole expansion the selected inversion (PEXSI) technique that circumvents the need to 
diagonalize the Hamiltonian, result in a highly efficient and scalable computational scheme for analyzing the electronic structures 
of ACPNRs as well as its dynamics. The total wall clock time for calculating the electronic structures of large-scale ACPNRs 
containing 1080-10800 atoms is only 10-25 s per self-consistent field (SCF) iteration, with accuracy fully comparable to that 
obtained from conventional planewave DFT calculations. For the ACPNR system, we observe that the DGDFT methodology 
can scale to 5,000-50,000 processors. We use DGDFT based ab-initio molecular dynamics (AIMD) calculations to study the 
thermodynamic stability of ACPNRs. Our calculations reveal that a 2 x 1 edge reconstruction appears in ACPNRs at room 
temperature. 


1 Introduction 

Kohn-Sham density functional theory (DFT)GE1 

is the most 

widely used methodology for performing ah initio electronic 
structure calculations to study the structural and electronic 
properties of molecules, solids and nanomaterials. However, 
until recently, DFT calculations are limited to small systems 
because they have a relatively high complexity 
with the system size N. As the system size increases, the cost 
of traditional DFT calculations becomes prohibitively expen¬ 
sive. Therefore, it is still challenging to use DFT calculations 
to treat large-scale systems that may contain thousand or tens 
of thousands of atoms. Although various linear scaling ) 
methodsl^® have been proposed for improving the efficiency 
of DFT calculations, they rely on the nearsightedness princi¬ 
ple, which leads to exponentially localized density matrices in 
real-space for systems with a finite energy gap or at finite tem¬ 
perature. On the other hand, most of the existiM linear scal¬ 
ing DFT codes, such as SIESTA,E1 cONQUEST,|3oPENMxE 1 
and HONPAS,l^are based on the contracted and localized ba¬ 
sis sets in the real-space, such as Gaussian-type orbitals or nu- 
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merical atomic orbitals.Ett is relatively difficult to improve the 
accuracy of methods based on such contracted basis functions 
in a systematic fashion compared to methods based on con¬ 
ventional uniform basis sets, for example, the planewave basis 
set,n^. The disadvantage of using uniform basis sets is the 
relatively large number of basis functions required per atom. 

Recently, we have developed a massively parallel DGDFT 
(Discontinuous Galerkin Density Functional Theory) method¬ 
ology for performing efficient large-scale Kohn-Sham DFT 
calculations. The methodology is based on the combination 
of the adaptive local basis (ALB) setl^^ and the wle expan¬ 
sion and selected inversion (PEXSI) technique.! xhe ALB 
functions are localized in the real space and discontinuous 
in the global domain. The continuous Kohn-Sham orbitals 
and density are assembled from the discontinuous b asis fu nc- 
tions using the discontinuous Galerkin (DG) method.EEl Be¬ 
cause it is rooted in a domain decomposition approach that 
takes the chemical environment effects into account, the ALB 
set constructed by the DGDET methodology is systematically 
improvable. It can achieve the same level of accuracy ob¬ 
tained by conventional plane wave calculations with much 
fewer number of basis functions. The sparse Hamiltonian 
matrix generated from DGDET can take advantage of the 
PEXSI method. The PEXSI method overcomes the 
scaling limit for solving Kohn-Sham DET, and scales at most 
as even for metallic systems at room temperature. In 





particular, the computational complexity of the PEXSl method 
is only i^{N) for ID systems, and is for 2D systems. 

This also makes the DGDFT methodology particularly suit¬ 
able for analyzing low-dimensional (ID and 2D) systems re¬ 
gardless whether the system is a metal, a semiconductor or an 

insulator.E] 

In this paper, we demonstrate the accuracy and efficiency 
of DGDFT by using it to analyze the electronic structures and 
thermodynamic stability of armchair phosphorene nanorib¬ 
bons (ACPNRs), which is an interesting ID derivative of phos¬ 
phorene with some remarkable properties. We use DGDFT to 
perform both static electronic structure calculations as well 
as ah initio molecular dynamics (AIMD) calculations. Our 
AIMD calculations reveal that a 2 x 1 edge reconstruction ap¬ 
pears in the edge unpassivated ACPNRs at room temperature. 

The paper is organized as follows. In section we in¬ 
troduce our recently developed massively parallel DGDFT 
methodology for efficient large-scale Kohn-Sham DFT based 
electronic structure calculations. In section we provide 
some background on phosphorene nanoribbons that we exam¬ 
ine. We report the results obtained from applying DGDFT 
to ACPNRs in section |4l We demonstrate that the DGDFT 
methodology can achieve high accuracy with much fewer ba¬ 
sis functions compared to the conventional plane wave dis¬ 
cretized calculations. We also show that DGDFT can han¬ 
dle large ACPNRs systems with thousand or even tens of 
thousands of atoms. Furthermore, we show that the DGDFT 
methodology is highly scalable on modern high performance 
computers because it contains multiple levels of paralleliza¬ 
tion. Finally, we show that by using DGDFT based ab-initio 
molecular dynamics (AIMD) calculations, we are able to iden¬ 
tify a 2 X 1 edge reconstruction in the edge-unpassivated 
ACPNRs at room temperature. This observation suggests that 
PNRs may modify their electronic structures over time, hence 
are suitable phosphorene-based candidate materials for nano¬ 
electronics. 


2 DGDFT Methodology 

In this section, we briefly present the mathematical founda¬ 
tion and algorithmic ingredients of the DGDFT methodol¬ 
ogy. DGDFT constructs adaptive local basis set (ALB) in 
the discontinuous Galerkin (DG) framework.Q^ We explain 
why the implementation of DGDFT is highly scalable on mas¬ 
sively parallel computers. Because the sparse Hamiltonian 
constructed by DGDFT can take full advantage of the recently 
developed pole expansion and selected inversion (PEXSl) 
methodto overcome the scaling of diagonaliza- 

tion methods, it can be used to study the electronic structures 
and ah initio molecular dynamics (AIMD) of large-scale atom¬ 
istic systems. 


2.1 Adaptive local basis set in a discontinuous Galerkin 
framework 

In our recent work,li3 we have presented a new way to dis¬ 
cretize the Kohn-Sham Hamiltonian, called the adaptive lo¬ 
cal basis functions (ALB). The basic idea of ALB is to use 
eigenfunctions of the Kohn-Sham Hamiltonian dehned on lo¬ 
cal domains to construct basis functions. Compared to atom- 
centered basis functions such as Gaussian type orbitals and 
numerical atomic orbitals, such procedure encodes not only 
atomic structure but also environmental effects into the ba¬ 
sis functions. In practice, we partition the global computa¬ 
tional domain into a number of subdomains (called elements). 
Then we define a buffer area for each element that typically 
includes its nearest neighbor elements. We refer to the ele¬ 
ment together with its buffer area as an extended element. For 
instance. Fig. [T] shows an ACPNR with 54 P atoms (P 54 sys¬ 
tem) partitioned along the Z-direction into 5 elements. The 
extended element associated with the second element £2 con¬ 
tains elements Ei,E 2 ,Ej, and the extended element associated 
with the third element £3 contains elements £2 , £3 , £4 and so 
on. We compute eigenfunctions for a local Kohn-Sham prob¬ 
lem in each extended element with periodic boundary condi¬ 
tions using a local planewave basis set. The artihcial effect 
due to the periodic boundary condition of the extended ele¬ 
ment is reduced by restricting the point-wise values of eigen¬ 
functions from the extended element to the element, and the 
restricted eigenfunctions are mutually orthogonalized on the 
element. We call such orthogonalized functions adaptive local 
basis functions. Note that the ALB functions can be com¬ 
puted at each step of the self-consistent held (SCF) iteration 
through an efficient iterative eigensolver using e.g. locally op¬ 
timal block preconditioned conjugate gradient (LOBPCG).I^ 
Since the elements are disjoint from each other, each ALB is 
strictly zero outside its element, and is not continuous across 
the boundaries of different elements . Th erefore, we use the 
discontinuous Galerkin (DG) methodl^^ESI to construct a hnite 
dimensional Kohn-Sham Hamiltonian represented by these 
types of discontinuous basis functions. For instance, for pe¬ 
riodic systems in a norm-conserving pseudopotential frame¬ 
work, the linearized DG energy functional at each step of the 
self-consistent held (SCF) iteration becomes 

1 ^ 
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{£’i,£’ 2 j£^ 3 )£^ 4 j£^ 5 }> with the collection of all its surfaces de¬ 
noted by . The set contains the N occupied Kohn- 

Sham orbitals represented as the linear combination of ALB 
functions. We use Vgff to denote the effective one-body po¬ 
tential (including local pseudopotential, Hartree potential and 
the exchange-correlation potential) at each SCF iteration. The 
terms that contain Z?/ ^ and 7 / ^ correspond to the nonlocal pseu¬ 
dopotential. Here ^ is the sum of the inner product on 
each element, and {■,-)y is the sum of the inner product on 
each surface. {{ • }} and [[ • ]] are the average and the jump 
operators across surfaces due to the discontinuity of the basis 
functions. We refer the readers to Ref.Elfor more detailed in¬ 
formation. What distinguishes the DG formulation from the 
standard Kohn-Sham formulation of the DFT problem is the 
last two terms in Eq. 0 , which comes from the integration by 
parts of the Laplacian operator, and a penalty term to stabilize 
the numerical evaluation of the energy, respectively. The DG 
method modifies the Kohn-Sham energy functional so that the 
kinetic energy functional is well defined even with discontin¬ 
uous basis functions. The DG solution is also fully consistent 
with the solution of standard Kohn-Sham equations in the limit 
of a complete basis set, and the error can be measured by a 
posteriori error estimators .03 The ALB functions can achieve 
high accuracy (less than 1 meV per atom) in the total energy 
calculation with a very small number (4—40) of basis func¬ 
tions per atom, compared to fully converged planewave calcu¬ 
lations. 

Using a ID ACPNR (P 54 ) as an example, we show the iso¬ 
surfaces of the first three ALB functions in the second element 
in Fig. [TJa)-(c). The global computational domain is parti¬ 
tioned along the Z-direction into 5 elements. Each ALB func¬ 
tion shown is strictly localized inside the second element and 
is therefore discontinuous across the boundary of elements. 
On the other hand, each ALB function is delocalized across 
a few atoms inside the element since they are obtained from 
eigenfunctions of local Kohn-Sham Hamiltonian. Although 
the basis functions are discontinuous, the electron density is 
well-defined and is very close to be a continuous function 
in the global domain (Eig. [TJd)) once the local contributions 
are assembled. It should be noted that all ALB functions are 
by construction mutually orthogonal. Thus the corresponding 
overlap matrix is an identity matrix. Hence, this formulation 
avoids solving a generalized eigenvalue problem that has a po¬ 
tentially ill-conditioned overlap matrix. 



Fig. 1 (Color online) The isosurface of the first three ALB 
functions, (a) 0;, (b) 02, (c) 03, belonging to the second element 
and (d) the electron density p across the YZ plane in the global 
domain in the example of P54. There are 5 elements and 160 ALB 
functions in each element in the P54 system. 


parallelization is called intra-element parallelization. On top 
of this, the computation of eigenfunctions for different ele¬ 
ments, together with the construction of the DG Hamiltonian 
can be naturally parallelized at a coarse grain level. This is 
called inter-element parallelization. We optimized the data 
communication structure so that different levels of paralleliza¬ 
tion can be seamlessly and efficiently performed. We will 
demonstrate the details of the parallelization strategy on mas- 
sively parallel computers in a separate publication in prepara¬ 
tion.^ 

In the intra-element parallelization, the wavefunction and 
eigenfunctions of each extended element are distributed 
among different processors. The number of eigenfunctions 
to be computed for a single element is usually on the order of 
100 , and intra-element parallelization can scale to several hun¬ 
dred processors. The level of concurrency that can be achieved 
in the inter-element parallelization is determined by the num¬ 
ber of elements. In the DGDET method, each element usu¬ 
ally takes around 10 atoms, and for a system containing 1000 
atoms there should be around 100 elements. As a result, the 
two-level parallelization strategy can readily scale to 10,000 
processors. Eor the largest ACPNRs system studied in this 
work, the number of processors used is 50,000 processors. 


2.2 Two levels parallelization strategy 

The DGDET framework naturally allows two levels of par¬ 
allelization. Eor each element, the computation of eigen¬ 
functions for the local Kohn-Sham Hamiltonian can be paral¬ 
lelized similar to how a regular Kohn-Sham DET solver with 
planewave basis sets is parallelized. This type of fine-grained 


2.3 Pole expansion and selected inversion method 

Once the DG Hamiltonian is constructed, one can solve a stan¬ 
dard eigenvalue problem to obtain physical quantities such as 
electron density, total energy and atomic forces. This can 
be done by treating the DG Hamiltonian matrix as a dense 
matrix and by solving the eigenvalue problem via standard 



































parallel linear algebra software packages for dense matrices, 
e.g. ScaLAPACKl23 (refeiTed to as the “DIAG” method). The 
computational cost of the DIAG method scales as This 

parallel scalability of ScaLAPACK diagonalization subroutine 
is limited to a few thousands of processors. When more than 
10,000 processors are available, DIAG can become the com¬ 
putational bottleneck because it cannot take advantage of that 
many processors even though other part of the DGDFT calcu¬ 
lation become less time consuming. 

The recently developed pole expansion pole expansion and 
selected inversion (PEXSI) methodl^^®! avoids the diagonal¬ 
ization procedure completely. It evaluates physical quantities 
such as electron density, energy, atomic force without calcu¬ 
lating any eigenvalue or eigenfunction, and reduces the com¬ 
putational complexity to at most without sacrificing 

accuracy even for metallic systems. In particular, the com¬ 
putational complexity of the PEXSI method is only ff{N) for 
ID systems (such as ACPNRs studied here), and is 
for 2D systems. These are much more favorable compared 
with the complexity of the DIAG method. Therefore, 

the PEXSI method is particularly suitable to study the elec¬ 
tronic structure of larges scale low-dimensional (ID and 2D) 
systems. The PEXSI method is also highly scalable to more 
than 10,000 processors, as recently de mons trated in the mas¬ 
sively parallel SIESTA-PEXSI methodEEl based on numeri¬ 
cal atomic orbitals. Therefore the combined DGDET-PEXSI 
method can scale beyond 10,000 processors and solves elec¬ 
tronic structure problem with more than 10,000 atoms. 

3 Theoretical model of ACPNRs 

Phosphorene, a new two dimensional (2D) elemental mono- 
layer,122H251 received considerable amount of interest re¬ 
cently after it has been experimentally isolated through me¬ 
chanical exfoliation from bulk black phosphorus. Phospho¬ 
rene exhibits some remarkable electronic properties superior 
to graphene, a well known elemental sp^-hybridized carbon 
monolayer.l23(Sl Por example, phosphorene is a direct semi¬ 
conductor with a high hole mobility.1221 It has the drain current 
modulation up to 10^ in nanoelectronics.^S These remarkable 
properties have already been used for wide applications in 
field effect transistors^S! and thin-hlm solar cells. IS! Purther- 
more, up to now, phosphorene is the only stable elemental 
2D material which can be mechanically exfoliated in exper¬ 
iments®! besides graphene. Therefore, it can potentially be 
used as an alternative to graphene®! in the future and lead to 
faster semiconductor electronics. 

By cutting 2D phosphorene into hnite-sized ID phospho¬ 
rene nanoribbons (PNRs), a bandgap engineering technique 
often used for graphene®HlI! to get graphene nanoribbons 
(GNRs),®!tS!one obtains a new type of material that has been 
subject to many theoretical and experimental studies.®!®!The 


stability and electronic properties of PNRs depend sensitively 
on the ribbon width and how it is cut from the 2D phospho¬ 
rene, which can result in either armchair or zigzag shaped 
edges.®! Unlike GNRs,®!®! hydrogen-passivated PNRs with 
armchair and zigzag edges are all semiconductors with direct 
band gaps.®! For edge-unpassivated PNRs armchair edged 
PNRs (ACPNRs) are all semiconducting, but zigzag edged 
PNRs (ZZPNRs) all exhibit metallic characteristics. Eurther- 
more, it has been found that edge-unpassivated ZZPNRs ex¬ 
hibit instability at the edge boundary that may easily induce 
edge reconstruction and disorder. Using density functional 
theory (DPT) calculations, Ramasubramaniam et al.®! have 
shown that a 2 x 1 edge reconstruction appears in the edge- 
unpassivated ZZPNRs. The reconstruction induces different 
stability and electronic structures of ZZPNRs. Edge disorder 
is also observed by Guo et al.®!in the edge-unpassivated ZZP¬ 
NRs with ab-initio molecular dynamics calculations. 

However, the edge-unpassivated ACPNRs seem to be ther¬ 
modynamically stable at the edge boundary,®! and up to now, 
no edge reconstruction or disorder has been predicted theo¬ 
retically in the edge-unpassivated ACPNRs. In the present 
work, we focus on the edge-unpassivated ACPNRs because 
the hydrogen-passivated PNRs been theoretically proved to 
be very thermodynamically stable and the edge reconstruction 
has been observed in the edge-unpassivated ZZPNRs.®! 

Pig. 1^ shows the atomic conhguration of a 2D phosphorene 
monolayer in a 1 x 6 x 4 supercell and some examples of ID 
ACPNRs with a width A = 4 in the unit cell (Pig), 1 x 1 x 

3 (P 54 ) and 1x1x10 (Piso) supercells. Other ACPNRs in 
very large supercells involving thousand or tens of thousands 
of atoms, such as the 1 x 1 x 120 (P 216 o)j 1 x 1 x 240 (P 4320 ) 
and 1 X 1 X 600 (Piogoo) supercells, which we adopt in this 
work, are not shown here. The vacuum space in the X and Y 
directions is about 10 A to separate the interactions between 
neighboring slabs in ACPNRs. 

4 Results and Discussion 

In this section, we present computational results obtained by 
applying DGDET to ACPNRs of different sizes. We demon¬ 
strate the accuracy of the calculation and parallel efficiency of 
DGDET We also report a 2 x 1 edge reconstruction observed 
in a AIMD study performed to assess the thermodynamic sta¬ 
bility of ACPNRs. 

We use the conventional plane wave software pack¬ 
age ABINIT®! as a reference to check the accuracy for 
our DGDET calculations. The same exchange-correlation 
functionals, including the local density approximation of 
Goedecker, Teter, Hutter (LDA-Teter93)® and general¬ 
ized gradient approximation of Perdew, Burke, and Ernz- 
erhof (GGA-PBE),®! and the Hartwigsen-Goedecker-Hutter 
(HGH) norm-conserving pseudopotential®! are adopted in 
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Fig. 2 (Color online) Geometric structures of (a) 2D phosphorene in 
the 1 X 6 X 4 supercell and different ID ACPNRs with a width N = 
4 in the (b) unit cell (Pig), (c) 1 x 1 x 3 (P 54 ) and (d) 1 x 1 x 10 
(Pi go) supercells. The violet balls denote phosphorus atoms. Two 
types of edges, armchair and zigzag, are highlighted in the insert. 

both ABINIT and DGDFT software packages. All calcula¬ 
tions are performed on the Edison system available at the Na¬ 
tional Energy Research Scientific Computing (NERSC) cen¬ 
ter. 

4.1 Computational accuracy 

We first check the accuracy of total energy and atomic force 
of the DGDET software package by using P 54 shown in Eig- 
ure|^c) as an example. To simplify our discussion, we define 
the total energy error per atom AE (Hartree/atom) and maxi¬ 
mum atomic force error AE (Hartree/Bohr) as 

AE = (^dgdft _ £.abinit^ 

and 

AF = max jifCOFT _ ^abinitj 

respectively, where N and I correspond to the total number 
of atoms and an atom index, ^dgdft ^abinit repre¬ 
sent the total energy computed by DGDET and ABINIT re¬ 
spectively, and F}DGDft ^^yniNiT represent the Hellmann- 
Eeynman force on the /-th phosphorus atom in P 54 computed 
by DGDET and ABINIT, respectively. We find that neglecting 
the Pulay force in the atomic force leads to moderate deviation 
in the conserved energy in the AIMD simulation. The ABINIT 
results are obtained by setting the energy cutoff to 200 Hartree 
for the wavefunction to ensure full convergence. The kinetic 


Table 1 The accuracy of DGDFT in terms of the total energy error 
per atom AE (Hartree/atom) and the maximum atomic force error 
Af (Hartree/Bohr) in the DIAG and PEXSI methods with different 
energy cutoff Ecut (Hartree) of wavefunction and number of ALB 
functions per atom, compared with converged ABINIT calculations, 
#ALB means the number of ALB functions per atom. 


DGDET P54 DIAG PEXSI 


Ecut 

#ALB 

AE 

AF 

AE 

AF 

10 

28 

1.94E-02 

4.81E-02 

1.94E-02 

4.81E-02 

20 

28 

6.49E-04 

5.12E-03 

5.39E-04 

1.67E-02 

40 

10 

1.28E-03 

1.52E-02 

1.21E-03 

4.19E-03 

40 

12 

5.54E-04 

2.17E-03 

6.45E-04 

2.17E-03 

40 

15 

1.87E-04 

9.54E-04 

1.16E-04 

9.57E-04 

40 

19 

7.00E-05 

4.00E-04 

7.12E-05 

4.13E-04 

40 

28 

9.64E-06 

2.90E-04 

4.21E-05 

2.84E-04 

100 

28 

8.25E-06 

1.24E-04 

2.90E-05 

1.31E-04 

200 

28 

6.62E-06 

9.43E-05 

3.66E-05 

9.09E-05 


energy cutoff (denoted by Ecut) in the DGDET method is 
used to define the grid size for computing the ALBs as is in 
standard Kohn-Sham DET calculations using planewave ba¬ 
sis sets. Ecut is also directly related to the Legendre-Gauss- 
Lobatto (LGL) integration grid defined on each element and 
used to perform numerical integration as needed to construct 
the DG Hamiltonian matrix. 

Table shows that the total energy and atomic forces pro¬ 
duced by the DGDET method are highly accurate compared 
to the ABINIT results. In particular, the total energy error 
AE can be as small as 6.6 x 10^® Hartree/atom if the DIAG 
method is used to compute the charge density and 3.7 x 10^^ 
Hartree/atom if the PEXSI method is used to compute the 
charge density respectively. The maximum error of the atomic 
force can be as small as 9.4 x 10^^ Hartree/Bohr when DIAG 
is used and 9.1 x 10^^ Hartree/Bohr when PEXSI is used. 
These results are obtained when only a relatively small num¬ 
ber (28) of ALB functions per atom are used to construct the 
global DG Hamiltonian. The energy cutoff for the local wave- 
functions use to represent the ALB functions is set to 200 
Hartree in this case. Note that the accuracy of total energy 
and atomic force in DGDET depends on both the energy cut¬ 
off for local wavefunctions defined on an extended element 
and the number of ALB functions. We can see from Table [T] 
that the accuracy in energy and forces both improve as either 
the energy cutoff or the number of ALB functions increases. 
This clearly demonstrates that the ALB set produced by the 
DGDET methodology is systematically improvable. 

When the PEXSI method^^^®^ is used to compute the 
charge density, the accuracy of the computation is determined 
by the number of poles used in the pole expansion.^Sl We ex- 















amined the effect of the number of poles on the accuracy of 
total energy and atomic force in DGDFT, and found that suf¬ 
ficiently high accuracy (comparable to that achieved by us¬ 
ing the DIAG method to compute the charge density) can be 
achieved when the number of poles is set to 50. 

In our parallel efficiency tests and AIMD simulations, we 
use a energy cutoff of 40 Hartree for wavefunction and 15 
ALB functions per atom to achieve a good compromise be¬ 
tween accuracy and computational efficiency. For this par¬ 
ticular choice of energy cutoff and number of ALB func¬ 
tions, we are able to keep the total energy error under 1 x 
10^^ Hartree/atom and atomic force error under 1 x 10^^ 
Hartree/Bohr for large-scale ACPNRs. 

4.2 Parallel efficiency 

In the DGDFT method, each SCF iteration performs the fol¬ 
lowing three main steps of computation: (a) the generation of 
ALB functions, (b) the constrviction of DG Hamiltonian ma¬ 
trix via ALB functions and (c) the evaluation of the approxi¬ 
mate charge density, energy and atomic forces by either diag¬ 
onalizing the DG Hamiltonian (DIAG) or by using the PEXSI 
technique. Note that there are some additional steps such as 
the computation of energy, charge mixing or potential mixing, 
and intermediate data communication etc. The cost of these 
steps is included in the total wall clock time in Fig.|^(d). 

Fig. shows the strong parallel scaling of these three indi¬ 
vidual steps of computation, as well as the overall computa¬ 
tion, for three large scale ACPNRs (Piieo^ P 4320 and Piogoo) in 
terms of the wall clock time per SCF step. 

The wall clock time of the first two steps are independent of 
whether PEXSI or DIAG is used to evaluate electron density, 
energy and forces. Eig.j^a) and (b) show that they both scale 
nearly perfectly with respect to the number of processors used 
in the computation for all test problems we used. Eurthermore, 
The total wall clock time required to perform each one these 
steps is reduced to a few seconds even for P 10800 when more 
than 10,000 processors are used in the computation. 

Eig. [^c) and (d) show that the evaluation of the approxi¬ 
mate charge density using the DG Hamiltonian matrix domi¬ 
nates the total wall clock time per SCF iteration in the DGDFT 
methodology. For large-scale ACPNRs, the PEXSI method 
can effectively reduce the wall clock time compared to the 
DIAG method in the DGDFT methodology. Furthermore, us¬ 
ing the DIAG method with ScaLAPACK,!^ appears to limit 
the strong parallel scalability for large-scale ACPNRs to at 
most 5,000 processors on the Edison. Increasing the number 
of processors beyond that can lead to an increase in wall clock 
time. In contrast, the PEXSI method exhibits highly scalable 
performance. It can make efficient use of more than 20,000- 
50,000 processors on Edison for Pio 800 - It should be noted 
that the total wall clock time required for performing large- 




Number of processors 




Number of processors 


Fig. 3 (Color online) The change of wall clock time with respect to 
the number of processors used for the computation for three ACPNR 
systems of different sizes (f’ 2 l 60 > P4320 ™d Piosoo)- I®) Strong 
scaling of the generation of ALB functions step, (b) strong scaling 
of the DG Hamiltonian matrix construction step, (c) strong scaling 
the evaluation of the approximate charge density, energy and forces 
from the constructed DG Hamiltonian matrix, (d) strong scaling of 
the overall computation. The reported wall clock time is for one 
SCF iteration. The timing and scaling shown in (c) and (d) depend 
on whether DIAG (hollow markers) or PEXSI (solid markers) is 
used to evaluate physical quantities such as charge density, energy 
and forces. 
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scale ACPNRs containing thousands or tens of thousands of 
atoms is only about 10-25 seconds per SCF iteration. 


4.3 AIMD simulation 

Ab-initio molecular dynamics (AIMD) simulation capabil¬ 
ity has been implemented in the DGDFT method.® We use 
DGDFT AIMD simulation to study the thermodynamic sta¬ 
bility of ACPNRs. Using Pi go as an example, we perform an 
AIMD simulation to obtain a 2.5 picosecond (ps) trajectory 
of ACPNR dynamics with a time step of 2.0 femtosecond (fs) 
under canonical ensemble with the temperature hxed at 30 0 K 
controlled by a single level Nose-Hoover thermostat.®® The 
mass of the Nose-Hoover thermostat is chosen to be 85000 au. 
We use the GGA-PBE® exchange-correlation functional for 
this particular simulation. 

In Fig. we plot the temperature (computed by 
Jt/lNksT = Ek where is the kinetic energy) and total free 
energy of Pigo along the simulated trajectory. The temper¬ 
ature of the system reaches around 300 K after 1.5 ps. Al¬ 
though DGDFT only uses the Hellmann-Feynman force, we 
have observed that the drift of the conserved Hamiltonian in 
the Nose-Hoover thermostat is relatively small at 2.6 x lO^'* 
Hartree per atom per ps. 

We examine the electronic structures of Pigo during 2.5 ps 
at 300 K as shown in Fig.|^ Geometric structures and density 
of states (DOS) of three AIMD snapshots at f = 0.0, 0.6 and 
2.0 ps are plotted in Fig. |^b) and (c). In the initial configu¬ 
ration (t = 0.0 ps), the geometry of ACPNR is optimized first 
by using a gradient descent method with the Barzilai-Borwein 
line search technique® implemented in DGDFT. After t = 0.6 
ps, the ACPNR exhibits some local deformations due to the 
thermal perturbation introduced by the temperature. After t = 
2.0 ps, 2 X 1 edge reconstruction can be observed. We find 
that the electronic structures of ACPNRs are also affected by 
thermal perturbation and edge reconstruction. Fig.|^(a) indi¬ 
cates that the highest occupied molecular orbital (HOMO) en¬ 
ergy is shifted by around —0.3 eV, and the lowest unoccupied 
molecular orbital (LUMO) energy is shifted by around —0.2 
eV along the MD trajectory. The HOMO-LUMO energy gaps 
of Pigo are calculated to be 0.63, 0.44 and 0.38 eV at f = 0.0, 
0.6 and 2.0 ps, respectively, showing that the shift of the en¬ 
ergy level is more pronounced for the HOMO than the LUMO 
as shown in Fig.|^(c). Therefore, the edge-unpassivated ACP¬ 
NRs are also thermod ynami c ally unstable just like the edge- 
unpassivated ZZPNRs.®® This behavior is quite different 
from that of edge-unpassivated graphene nanoribbons.®® 
The r econs truction of edges in PNRs can modify their elec¬ 
tronic®® and transport® properties and make them potential 
candidate materials for phosphorene-based electronic devices, 
such as held effect transistors.® 
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Fig. 4 (Color online) (a) kinetic temperature and (b) total free 
energy along the AIMD trajectory for the ACPNR (Pigo). The 
simulation is performed for 2.5 ps at 300 K. 
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Fig. 5 (Color online) (a) HOMO, LUMO and energy gap for the 
ACPNR (Pi8o) along the AIMD trajectory, (h) Geometric structures 
and (c) density of states (DOS) at three snapshots of the simulation 
at f = 0.0, 0.6 and 2.0 ps. The Fermi level is marked by the green 
dotted line. 


In summary, we developed a massively parallel DGDFT (Dis¬ 
continuous Galerkin Density Functional Theory) methodol¬ 
ogy for efficient large-scale Kohn-Sham density functional 
theory (DFT) calculations based on the combination of the 
adaptive local basis (ALB) set and the pole expansion and se¬ 
lected inversion (PEXSI) technique. The DGDFT methodol¬ 
ogy can achieve a high basis set accuracy comparable to that 
provided by conventional plane wave calculations but with a 
small number of ALB basis functions per atom for large-scale 
electronic structure calculations that involve thousand or tens 
of thousands of atoms. Furthermore, the DGDFT methodol¬ 
ogy is highly scalable based on two levels of parallelization 
(intra- and inter-element parallelization), which can make effi¬ 
cient use of more than 50,000 processors on high performance 
machines for the systems studied here. Using ab-initio molec¬ 
ular dynamics calculations on armchair phosphorene nanorib¬ 
bons (ACPNRs), we find that a 2 x 1 edge reconstruction ap¬ 
pears in ACPNRs at room temperature to modify their elec¬ 
tronic structures for phosphorene-based nanoelectronics in the 
future. 
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